Clustering in complex networks. I. General formalism 
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We develop a full theoretical approach to clustering in complex networks. A key concept is 
introduced, the edge multiplicity, that measures the number of triangles passing through an edge. 
This quantity extends the clustering coefficient in that it involves the properties of two -and not 
just one- vertices. The formalism is completed with the definition of a three-vertex correlation 
function, which is the fundamental quantity describing the properties of clustered networks. The 
formalism suggests new metrics that are able to thoroughly characterize transitive relations. A 
rigorous analysis of several real networks, which makes use of the new formalism and the new 
metrics, is also provided. It is also found that clustered networks can be classified into two main 
groups: the weak and the strong transitivity classes. In the first class, edge multiplicity is small, with 
triangles being disjoint. In the second class, edge multiplicity is high and so triangles share many 
edges. As we shall see in the following paper, the class a network belongs to has strong implications 
in its percolation properties. 

PACS numbers: 89.75.-k, 87.23. Ge, 05.70.Ln 



I. INTRODUCTION 

The important role of transitive relations in complex 
interaction systems has been exposed since the work of 
Georg Simmel, a popular 19th century German sociolo- 
gist who pointed out the interest in triads in a pioneering 
work on the concept of social structure . Simmel un- 
derstood society as a web of patterned interactions and 
focused on the study of the forms of these interactions as 
they occur and reoccur in diverse historical periods and 
cultural settings. His emphasis on quantitative aspects 
lead him to analyze, in particular, dyadic versus triadic 
relationships, to find that when a dyad is formed into a 
triad, the apparently insignificant fact that one member 
has been added actually brings about a major qualitative 
change, various actions and processes becoming possible 
where previously they could not take place. The triad is 
then seen as the simplest structure in which the group as 
a whole can achieve domination over its component mem- 
bers, and so becomes the scenario exhibiting the simplest 
expression of the sociological drama. 

In the study of complex networks, where large sys- 
tems of interactions are mapped into comprehensible 
graphs 0, IS H, just vertices and edges are neverthe- 
less usually recognized as the primary building blocks. 
Vertices represent the elementary units under mutual in- 
fluence, and the interactions are modeled by edges linking 
them. Transitive relations, represented in this scheme by 
triangles, arise then as a secondary form of basic organi- 
zation, made up of vertices connected by edges. However, 
the empirical evidence of a big number of triangles well 
above random expectations in the vast majority of real 
networks has brought this figure into attention, with a 
first reference to transitivity appearing in the literature 
of complex networks in the form of the clustering coeffi- 



cient a scalar measure quantifying the total number 
of triangles in a network through the average likelihood 
that two neighbors of a vertex are neighbors themselves. 
Triangles in complex networks are indissolubly tied to the 
analysis of degree correlations and they have been recog- 
nized as a fundamental element in the composition of re- 
curring subgraphs, the so-called motifs gg, closely related 
to the large-scale organization of complex networks 0, 
their functionality or community structure 00. So, in 
the framework of complex networks science, they have to 
be taken into account as a basic unbridged object, whose 
presence and self-organization can drastically impact net- 
work structure and properties. 

In this paper and the following one, we develop a full 
theoretical approach to cluste ring in complex networks 
on the basis of former work 0, It is extended 

and completed with novel results previously unreported 
which lead to a substantially improved understanding of 
how clustering can be measured and which is the reach 
of its effects. In this paper, we begin by exposing in the 
next three sections the ways of measuring clustering at 
different depth levels. In section^ we review, as a tech- 
nical introduction and for completeness, the standard lo- 
cal and global measures related to one- vertex clustering. 
In section ITTT1 we ask for the properties of not just one 
but two of the vertices involved in the triangles and to 
this end we review the concept of dyadic clustering from 
the definitions of edge multiplicity and edge clustering. 
Section ITVl treats the case of triadic clustering. In par- 
ticular, we propose a new measure, the average nearest 
neighbors multiplicity fh nn (k), to compute triadic clus- 
tering in a practical way. In Section^ we explore the ef- 
fects of degree-degree correlations on clustering. We find 
analytically that degree-degree correlations constrain the 
functional form of clustering and its maximum level. We 
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also examine some empirical networks, finding a good 
agreement with our predictions. Section I V II explores the 
condition for the simultaneous absence of degree corre- 
lation at the level of triangles and edges, which makes 
necessary the discrimination between weak and strong 
clustering. Finally, conclusions are drawn in Section lVIII 
In this way, this first paper lays the general formalism. 
The following one will focus on percolation properties. 



II. ONE- VERTEX CLUSTERING 

In the context of complex networks, the concept of 
clustering was introduced as a way to quantify the tran- 
sitivity of the connections. Several alternative definitions 
have been proposed, from global scalar quantities asso- 
ciated to the whole network [l2l IT^ | to local measures 
describing the properties of single nodes. This is the case 
of the clustering coefficient first introduced by Watts and 
Strogatz 0, 
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where Ti is the number of triangles passing through ver- 
tex i and ki is its degree. They also pointed out that 
real networks display a level of clustering -measured as 
the average of Ci over the set of vertices in the network, 
the so-called clustering coefficient C varying in the inter- 
val [0, 1]- typically much larger than that produced by 
random effects. 

The local clustering ct gives highly detailed informa- 
tion from a purely local perspective. One can adopt a 
compromise between the global property defined by C 
and the full local information given by Cj by defining an 
average of c, over the set of vertices of a given degree 
class [lj], that is, 



c(k) = — y 



1 



iex(fc) 



k(k - l)N k 



ieT(fe) 
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(2) 



where N k is the number of vertices of degree k and T(fc) 
is the set of such vertices. The corresponding scalar mea- 
sure is called the mean clustering coefficient and can be 
computed on the basis of the degree distribution P{k) as 



= ^P(fc)c(fc), 



which is related to the clustering coefficient as C = c/(l — 
P(0) — P (1)). In fact, we have implicitly assumed that 
c(k = 0) = c(k = 1) = 0, whereas in the definition of C 
we only consider an average over the set of vertices with 
degree k > 1. This fact explains the difference between 
both measures. 

In the case of uncorrelated networks, c(k) is indepen- 
dent of k. Furthermore, all the measures collapse and 



reduce to C 0,(11 



N 



(4) 



Therefore, a functional dependence of c(k) on the degree 
can be attributed to the presence of correlations. Indeed, 
it has been observed that many real networks exhibit a 
power-law behavior c(k) ~ k~ a , with typically < a < 
1. Hence, the degree dependent clustering coefficient has 
been proposed as a measure of hierarchical organization 
and modularity in complex networks 



III. DYADIC CLUSTERING 

The degree dependent clustering coefficient described 
in the previous section measures the transitivity of a ver- 
tex that participates in a triangle and, in this sense, it is 
a projection over one vertex of a structure that involves 
three vertices. Then, it is natural to ask for the prop- 
erties of not just one but two of the vertices involved in 
the triangle or, equivalently, to ask for the properties of 
edges involved in triangles. 

To do so, let us define the multiplicity of an edge, my, 
as the number of triangles in which the edge connecting 
vertices i and j participates. This quantity is the analog 
for edges to the number of triangles attached to a vertex, 
Ti. The two quantities are related through the trivial 
identity 



(5) 



which is valid for any network configuration. The matrix 
ay is the adjacency matrix, giving the value 1 if there is 
an edge between vertices i and j and otherwise. 

Again, is a local measure defined for every edge. 
We can coarse-grain and define the average multiplicity 
of the edges connecting the degree classes k and k', mkk', 
as 



m,kk' 



zJiex(fe) zJjex(fc') m ij a ij 



E. 



kk' 



(6) 



where E k y stands for the number of edges between those 
degree classes (two times that number if k = k'). The 



(3) multiplicity matrix nikk' is defined in the range [0, 



''kk' 



where m c kk , = min[k, k') — 1 and it represents a measure 
of dyadic clustering that gives a more detailed description 
than c(k) on how triangles are shared among vertices of 
different degrees. Furthermore, as we shall see in the fol- 
lowing paper, it contains relevant information to analyze 
the percolation properties of clustered networks. 

Now, it is possible to find a relation between multi- 
plicity and clustering. Taking into account the fact that 
the joint degree distribution can be defined as P(k, k') = 
liniTv^oo Ekk' /2-E, with E the total number of edges in 
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TABLE I: Empirical values of the average multiplicity in, the 
maximum multiplicity m max , and the clustering coefficient C 
for different real networks. 



Network 


m 




C 


PIN 


0.30 


10 


0.12 


AS 


2.55 


537 


0.45 


PGP 


3.31 


94 


0.50 


E-mail 


3.90 


28 


0.27 


Coauthors 


4.22 


74 


0.74 


WTW 


27.13 


163 


0.66 



the network, we obtain the following closure condition at 
the class level 



^m kk ,P{k,k') = k{k-l)c(k) 



P(k) 
(k) ■ 



(7) 



Let us emphasize that this equation is an identity ful- 
filled by any network, which ties it with the degree de- 
tailed balance condition derived in These identities 
are important because, given their universal nature, they 
can be used to derive properties of networks regardless 
their specific details. As an example, in Ref. ^| the de- 
tailed balance condition was used to prove the absence 
of epidemic threshold in scale-free networks. 

A global scalar measure can also be defined for dyadic 
clustering. It is the average multiplicity of the network, 
obtained by averaging nikk' over all degree classes, 



k k' 



(k(k - l)c(fc)) 

(k) 



(8) 



Values of to close to zero mean that there are no trian- 
gles. When to < 1, triangles are mostly disjoint and their 
number can be approximated as T(k) < k/2. Otherwise, 
when to 3> 1, triangles jam into edges, with many trian- 
gles sharing the same edge. Table I shows empirical val- 
ues for the average multiplicity to, the maximum multi- 
plicity ni max and the clustering coefficient C for different 
real networks. These are the Internet at the autonomous 
system level (AS) [2(|, the protein interaction network 
of the yeast S. Cerevisiae (PIN) |2l|. an intra-university 
e-mail network (2^ , the web of trust of PGP [2^ , the net- 
work of co-authorship among academics [24Ll25|. and the 
world trade web (WTW) of trade relationships among 
countries [26^. In all cases, except for the PIN network, 
the value of to indicates a noticeable jamming of triangles 
into edges. 

An alternative way to quantify dyadic clustering is by 
means of the edge clustering coefficient, defined in Q as 



c(k,k') 



m k k> 



l kk' 



(9) 



The advantage of using the normalized version c(k, k') 
instead of m kk : is that the edge clustering coefficient 



admits a probabilistic interpretation. Indeed, the one- 
vertex clustering coefficient c(k) can be viewed as the 
probability that two neighbors of a vertex of degree k 
are connected. c{k, k') can in its place be interpreted as 
the probability that an edge connecting two vertices of 
degrees k and k' share a common neighbor. 



IV. TRIADIC CLUSTERING 

Clustering is a measure of three point correlations, al- 
though it is not evident from the definitions of one- vertex 
clustering and dyadic clustering, respectively calculated 
as c(k) and rrikk'- To clarify this point, we use a similar 
approach to the one followed when analyzing two-point 
correlations. In that case, we made use of the matrix 
Ekk 1 , which counts the number of edges among differ- 
ent degree classes, to define the joint degree distribu- 
tion P(k, k') giving information on the probability that 
a randomly chosen edge of the network is connecting two 
vertices of degrees k and k' . In the case of triadic clus- 
tering, the fundamental object is not any more the edge 
but the triangle itself. Thus, let us define a completely 
symmetric tensor T kk > k n, which measures the number of 
triangles connecting vertices of the degree classes k, k' 
and k" when k =/= k 1 =/= k" , two times the number of tri- 
angles when two of the indices are equal, and six times 
the number of triangles when the three indices are equal. 
This tensor satisfies the following identity 



EE^ 



E ^ 

ieT(fe) 



k(k - l)c(k)P(k)N. (10) 



Then, we can define a joint distribution 

Q(k,k',k") = Tkk ' k " 



(k)fhN 



(11) 



which measures the probability that a randomly chosen 
triangle connects three vertices of degrees k, k' , and k" . 
The one point marginal distribution is in this case 



Q(fc)=EE^ fc ' fe/ ' A;/ ) 



k' k h 



k(k - l)c(k)P(k) 
(k)fh 



(12) 



The two-point marginal distribution Q(k, k') = 
Q(k,k' ,k") has an interesting interpretation. In- 
deed, it measures properties of the degrees of connected 
vertices and, in this sense, it is similar to P(k,k'). The 
main difference between both distributions is the way in 
which edges are selected. In the case of P(k, k'), an edge 
is randomly chosen and then one asks for the degrees at 
the ends of such edge. This selection mechanism implies 
that all edges in the network have the same probability 
to be chosen. In the case of Q(k,k'), one first selects a 
triangle with uniform probability among all the triangles 
present in the network and, once the triangle has been 
selected, one of its edges is randomly chosen. Then, the 
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degrees of the vertices attached to this edge are mea- 
sured. If one edge is shared by more than one triangle, 
this edge will be selected more often than edges that do 
not participate in triangles or in just one triangle. This 
implies that, in this case, edges are chosen with a non 
uniform probability which is proportional to their multi- 
plicity (Fig. ^ sketches this selection mechanism). This 
allows us to write 



Q(k,k') 



m kk >P(k,k') 



(13) 



Equations (|12|l and eventually complete the funda- 
mental functions which describes transitivity properties 
in complex networks. Indeed, being clustering a property 
that involves three distinct vertices, the most complete 
description is given by the function Q(k, k' , k") which 
refers to triadic clustering. Nevertheless, when no that 
much information is required, we can work with the two- 
vertices marginal distribution Q(k,k'). However, by do- 
ing so, a new quantity encoding the kernel of dyadic clus- 
tering, the multiplicity mkk' , naturally appears account- 
ing for the fact that edges can participate in more than 
one triangle. If we are interested in single vertices only, 
we are lead to the one- vertex marginal distribution Q(k) 
which, again, introduces in a natural way the concept of 
clustering coefficient c(k). All this means that, in fact, 
the functions c(k) and rrikk' are just projections of the 
same fundamental object, described by Q(k, k' , k"). 

Dealing in practice with the three-variables function 
Q(k,k' ,k") when studying triadic clustering is a rather 
complex task. A practical way to quantify triadic clus- 
tering requires the introduction of a new measure. To 
this end we propose to quantify the average multiplicity 
of edges among nearest neighbors in triangles attached 
to a vertex of degree fc, which we call average nearest 
neighbors multiplicity m nn (k) by analogy with the av- 
erage nearest neighbors degree, the function k nn (k) |27| . 
To compute m rm (fc) in a formal way, we first define the 
transition probability 



Q(k',k"\k) = 



from where we can write 



(k)fnQ(k,k',k' r ) 
k(k - l)c(fc)P(fc) ' 



k',k" 



m k >k»Q(k',k"\k). 



(14) 



(15) 



As in the case of k nn (k), in absence of correlations among 
the degrees of vertices forming triangles, the function 
Q{k' , k"\k) is independent of k, and so will be the case 
for the average nearest neighbors multiplicity. Therefore, 
any non-trivial dependence of fh nn (k) on k will signal the 
presence of correlations between the three degrees of the 
nodes that form triangles. 

In Fig. |21 we show measures of this function for the 
different real networks analyzed through the paper. As 
one can see, the patterns follow closely those for the av- 
erage nearest neighbors degree, that is, networks with as- 
sortative degree mixing also show an increasing fh nn (k), 
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FIG. 1: This figure illustrates the different information en- 
coded in the functions P(k,k') and Q(k,k'). In this simple 
graph, there are two kind of vertices, two of them with de- 
gree k = 2 and the other two of degree k — 3. Each edge is 
labeled with its multiplicity. The function (2 — S k k i)P(k, k') 
tells us which is the probability that a randomly chosen edge 
connects two vertices of degrees k and k' , respectively. Then, 
the probability that a randomly chosen edge connects two 
vertices of degrees k = 2 and k — 3 is 4/5 and the probability 
of connecting two vertices of degree k = 3 is 1/5. In the case 
of Q(k, k'), we first have to chose randomly a triangle -either 
triangle labeled 1 or 2 in this example- and from this triangle 
one of its edges is randomly chosen. Using this procedure, 
the probability that an edge connects two vertices of degrees 
k — 2 and k — 3 is 2/3 that corresponds to first chose one tri- 
angle -with probability 1/2 each in this particular graph- and 
then one edge -with probability 1/3. Analogously, the proba- 
bility of connecting two vertices of degree k — 3 is 1/3. Then 
we can write that P(2, 3) = P(3, 2) = 2/5, P(3, 3) = 1/5 and 
Q(2, 3) = Q(3, 2) = 2/6, Q(3, 3) = 2/6. 



whereas disassortative ones show decreasing dependen- 
cies as a function of k. This can be intuitively understood 
if we consider that the function k nn (k) appears to be an 
upper bound of fh nn (k). Despite this similarity, we also 
find differences in the behavior of m nn (k) as compared 
to k nn (k). In the case of the Internet at the autonomous 
system level, we find that both functions follow a power 
law decay as a function of k but clearly with different ex- 
ponents. In the case of the protein interaction network, 
TO nn (fc) is approximately constant whereas k nn (k) is a 
decreasing function of k. 



V. EFFECTS OF DEGREE-DEGREE 
CORRELATIONS ON CLUSTERING 

Degree-degree correlations constrain the maximum 
level of clustering a network can reach. A naive expla- 
nation for this is that, if the neighbors of a given node 
have all of them a small degree, the number of connected 
neighbors (and hence, the clustering of such a node) will 
be bounded. This is the main idea behind the new mea- 
sure of clustering introduced in |28j |. However, we can 
make a step forward and quantify analytically this effect. 
The key point is to realize that the multiplicity matrix 
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FIG. 2: Empirical measures of the average nearest neighbors 
multiplicity as a function of k compared with the average 
nearest neighbors degree. 



satisfies the inequality 

TTikk' < rnin(fc, k ) — 1, 



(16) 



which comes from the fact that the degrees of the nodes 
at the ends of an edge determine the maximum number of 
triangles this edge can hold. Multiplying this inequality 
by P(k, k') and summing over k' we get 

k (k- l)c(fc) < J2 ™»(*. k')P{k, k') kP{ - k) 



(k) 



(k) 



(17) 

where we have used the closure condition Eq. J7J. This 
inequality, in turn, can be rewritten as 

1 k 

c(fc) < 1 - - — - J2(k- k')P(k'\k) = X(k). (18) 



k' = l 



Notice that A(fe) is always in the interval [0, 1] and, there- 
fore, c(k) is always bounded by a function smaller (or 
equal) than 1. In the limit of very large values of k, 
Eq. JTHl reads 



»(fc) < \{k) 



k — l 



where k^ n (k) is the average nearest neighbors degree of 
a vertex with degree k. The superscript r (of reduced) 
refers to the fact that it is evaluated only up to k and, 
therefore, k^ n (k) < k. For strongly assortative networks 
k^ n (k) ~ k, so that X(k) ~ 0(1) and there is no restric- 
tion in the decay of c(k). In the opposite case of disas- 
sortative networks, the sum term in the right hand side 
of Eq. (|18f) may be fairly large and then the clustering 
coefficient will have to decay accordingly. 

It is important to mention that, although X(k) is an 
upper bound of c(k), it is not the lowest upper bound. 
In fact, in the inequality Eq. (|16f) we are not considering 
that the neighbors of the two vertices of degrees k and k! 
might have not enough free connections. An obvious case 
corresponds to vertices of degree 1. If the vertex of degree 
k has N(l\k) neighbors of degree 1 (others than the one of 
degree k') and the one of degree k' has N(l\k') neighbors 
of degree 1, then, the corrected inequality would be 



m kk , < min(fc - iV(lJfe), k' - N(l\k')) - 1. 



(20) 



The problem is that, now, iV(l|fc) is an stochastic quan- 
tity with expected value (iV(l|fc)> = (k-l)P(l\k) which, 
again, depends on the mixing properties of the network. 
This contribution is important in networks with a large 
number of vertices of degree 1. 
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FIG. 3: Clustering c(k) versus the maximum value X(k) for 
several real networks. In all cases, empirical measures fall 
below the diagonal line, validating the inequality Eq. (|18|l . 
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FIG. 4: Empirical measures of the ratio between the cluster- 
ing coefficient c(fc) and the maximum value \(k) for different 
real networks. 



The interplay between degree correlations and cluster- 
ing can also be observed in real networks. We have mea- 
sured the functions \(k) and c(k) for several empirical 
data sets, finding that the inequality Eq. (|18|l is always 
satisfied. In Fig. [21 we plot the clustering coefficient c(k) 
as a function of X(k). Each dot in these figures corre- 
sponds to a different degree class. As clearly seen, in all 
cases the empirical measures lie below the diagonal line, 
which indicates that the inequality Eq. I|18|l is always pre- 
served. In Fig. 21 we show the ratio c(k)/\(k). The rate 
of variation of this fraction is small and, thus, the de- 
gree dependent clustering coefficient can be computed as 
c(k) = \{k)f{k), where f(k) is a slowly varying function 
of k that, in many cases, can be fitted by a logarithmic 
function. This result implies that, to a large extent, the 
functional dependence of elk) is given by the particular 
shape of the degree-degree correlations. On the other 
hand, this also suggests that the edge clustering coeffi- 
cient, given by Eq. is also a weakly dependent -if 
not independent- function of the degrees k and k' . In- 
deed, empirical measures of c(k, k') in the studied real 
networks support this conjecture. Figure [S] shows con- 
tour plots of c(/c, k') using a logarithmic binning of the 
axes. In all cases, there is a dominant color, which indi- 
cates that the edge clustering is approximately constant. 
As expected from the information shown in Fig. the 
AS network and the Coauthor network are the less con- 
stant, although the variation of c(k, k') across different 
(k, k') domains is not very pronounced. This result is 
particularly important because, unlike to what happens 
for the degree dependent clustering coefficient c(fc), it al- 
lows to approximate dyadic clustering in many cases by 
a constant value. 



A. Scale-free networks 

Scale-free networks with degree distributions of the 
form P{k) ~ fc~ 7 belong to a special class of networks 
which deserve a separate discussion. Indeed, it has been 




Log[k] Log[k] 



FIG. 5: Contour plots of the edge clustering coefficient c(k, k') 
as a function of k and k' for the different real networks ana- 
lyzed. 

shown that, when the exponent of the degree distribu- 
tion lies in the interval 7 G (2,3] and its domain extends 
beyond values that scale as TV 1 ' 2 , disassortative correla- 
tions are unavoidable for high degrees m m m 113 

Almost all real scalc-frcc networks fulfill these conditions 
and, hence, it is important to analyze how these negative 
correlations constrain the behavior of the clustering co- 
efficient. Let us assume a power law decay of the average 
nearest neighbors degree of the form k nn (k) ~ nk~ s . One 
can prove that this function diverges in the limit of very 
large networks as k nn (k) ~ (k 2 ) ~ fc^ -7 , where k c is the 
maximum degree of the network [l9l | . Then, the prefactor 
k must scale in the same way which, in turn, implies that 
the reduced average nearest neighbors degree behaves as 

Kn(k) ~ k 3 ^- s . (21) 

Then, from Eq. I|19|l the exponent of the degree depen- 
dent clustering coefficient, a, must verify the following 
inequality 

a > 7 + (5-2. (22) 

Just as an example, in the case of the Internet at the 
Autonomous System level 01 1 the reported values for 



7 



these three exponents (a = 0.75, 7 = 2.1, and S = 0.5) 
satisfy this inequality close to the limit (a — 0.75 > 
7 + (5 - 2 = 0.6). 



VI. UNCORRELATED NETWORKS AND THE 
DISTINCTION BETWEEN WEAK AND 
STRONG CLUSTERING 



to the fact that, in the strong transitivity regime, the 
overlap of triangles is important, favoring thus the emer- 
gence of subgraphs which are tightly interconnected, the 
so-called fc-cores |34| . In contrast, in the weak transitiv- 
ity class, triangles are mostly disjoint and the topological 
properties of such networks are close to that of unclus- 
tered ones. 



When analyzing two-point correlations, the notion of 
uncorrelated network corresponds to a network in which 
the joint distribution P(k, k') factorizes as 



P(k,k') 



kk'P(k)P(k') 

w • 



(23) 



In the context of triangles, a network is uncorrelated 
when 



Q(k,k',k") = Q(k)Q(k')Q(k"), 



(24) 



where Q{k) was given in Eq. 112|) . The question is 
whether functions P(k,k') and Q(k, k' , k") can factor- 
ize simultaneously. First, we restrict to the study of the 
factorization of Q(k, k') instead of Q(k, k' , k") since, due 
to the symmetric attribute as a tensor of the last func- 
tion, the factorization of Q(k, k') is a sufficient condition 
for the factorization of Q(k, k' , k"). Indeed, the func- 
tion Q(k,k') measures correlations between connected 
vertices when edges are weighted by their multiplicity, 
whereas P(k 7 k') measures these correlations when edges 
are chosen with uniform probability. Given this differ- 
ence in the selection mechanism of edges, Q(k, k') and 
P(k, k') cannot factorize simultaneously when the sam- 
ple of edges is highly heterogeneous in their multiplicity 
values. In contrast, when rriij is either or 1, the sample 
of edges corresponding to triangles will become homoge- 
neous and whenever Q(k, k') factorizes, P(k, k 1 ) factor- 
izes too (for degrees larger than 1). In this case, we can 
write 



m kk , <x (k - l)(k' - l)c(k)c(k'). 



(25) 



Since in this approach mkk' < 1 for Vfc, k' , we have that 

1 



k - 1 



(26) 



for any uncorrelated network at the two-vertex level. In 
other situations, one can construct uncorrelated networks 
at the level of triangles but, at the same time, there will 
appear some correlations at the level of edges and vice- 
versa. 

This suggests to partition the space of clustered net- 
works into two main categories: weak transitivity -for 
networks with c(k) < (k — l) _1 ,Vfc- and strong transi- 
tivity in the opposite case. As we will show in the fol- 
lowing paper [23] , the percolation properties of clustered 
networks are totally different depending on which one of 
these categories the network belongs to. This is related 



VII. CONCLUSIONS 

In this paper, we have provided a new and powerful 
formalism to understand transitive relations in complex 
networks. We have defined a new fundamental quan- 
tity, Q(k, kl , k"), which measures the probability that a 
randomly chosen triangle connects three vertices of de- 
grees k, k' , and k" . The summation over one variable 
of this fundamental distribution gives information about 
two of the vertices participating in the triangle and, in a 
natural way, introduces the multiplicity of edges among 
two classes of degrees k and k', mkk' ■ The summation 
of Q(k, k! , k") over two of its variables gives information 
about the properties of vertices that participate in tri- 
angles and, as in the previous case, naturally defines the 
degree-dependent clustering coefficient c(k). To quantify 
the extent of the correlations encoded in Q(k,k' , k"), we 
have proposed a new metric, the average nearest neigh- 
bors multiplicity m„„(fc), finding interesting patterns 
when measured in real networks. We have also found 
that, in real networks, the edge clustering coefficient, de- 
fined as the ratio between mkk' and min(k — l,k' — 1), 
is a weakly dependent function of the degrees k and k' . 
This could serve as a basis for modeling of clustered net- 
works. This result also suggest that the functional form 
of the degree-dependent clustering coefficient is mainly 
determined by the two- vertex correlation structure of the 
network. Last but not least, we have found the con- 
ditions for the simultaneous factorization of Q(k, k', k") 
and P(k,k'). This is only possible if c(k) < (k — l) -1 . 
This partitions the space of clustered networks into two 
main categories, networks with weak transitivity -those 
that satisfy c{k) < (k — 1) _1 - and networks with strong 
transitivity -those that do not. In the first class, the mul- 
tiplicity of edges is either zero or one and triangles are 
disjoint. In the second class, edges are forced to share 
many triangles, giving rise to highly interconnected sub- 
graphs. We shall see in the following paper how the class 
a network belongs to changes its percolation properties. 
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